General and specific patterns of cortical gene expression as spatial correlates of complex cognitive functioning

Abstract Gene expression varies across the brain. This spatial patterning denotes specialised support for particular brain functions. However, the way that a given gene's expression fluctuates across the brain may be governed by general rules. Quantifying patterns of spatial covariation across genes would offer insights into the molecular characteristics of brain areas supporting, for example, complex cognitive functions. Here, we use principal component analysis to separate general and unique gene regulatory associations with cortical substrates of cognition. We find that the region‐to‐region variation in cortical expression profiles of 8235 genes covaries across two major principal components: gene ontology analysis suggests these dimensions are characterised by downregulation and upregulation of cell‐signalling/modification and transcription factors. We validate these patterns out‐of‐sample and across different data processing choices. Brain regions more strongly implicated in general cognitive functioning (g; 3 cohorts, total meta‐analytic N = 39,519) tend to be more balanced between downregulation and upregulation of both major components (indicated by regional component scores). We then identify a further 29 genes as candidate cortical spatial correlates of g, beyond the patterning of the two major components (|β| range = 0.18 to 0.53). Many of these genes have been previously associated with clinical neurodegenerative and psychiatric disorders, or with other health‐related phenotypes. The results provide insights into the cortical organisation of gene expression and its association with individual differences in cognitive functioning.

• Regional spatial correlations between cognitive-morphometry associations and the components of gene expression are calculated.Then, controlling for the two major dimensions, we identify a further 29 specific genes preferentially expressed in g regions.

| INTRODUCTION
In any given cell, genes that are required for that cell's function are expressed.Therefore, it is tenable that observed regional variations in the expression of genes across the brain reflect location-pertinent cellular processes critical for functioning.Information about regional gene expression profiles across the cerebral cortex has been recently used to infer substrates of brain development, maintenance, and health (Darby et al., 2016;Parker et al., 2020;Shin et al., 2018;Vidal-Pineiro et al., 2020).This is achieved by comparing the spatial agreement between the brain regional expression profiles of individual genes or gene sets with the brain regional associations with a phenotype of interest.For example, which specific genes or gene sets are more highly expressed in brain regions that are most strongly related to a particular phenotype of interest (Writing Committee for the Attention-Deficit/Hyperactivity Disorder, Autism Spectrum Disorder, Bipolar Disorder, Major Depressive Disorder, Obsessive-Compulsive Disorder, and Schizophrenia ENIGMA Working Groups, 2021)?This approach, while powerful, potentially suffers from confounding by association.That is, for example, the expression of an individual gene might show a correlation with a phenotype because it reflects general rules that govern the spatial variation in the expression of very many genes over the brain's cortex, rather than something unique to the specific gene in question.There are general dimensions of spatial variation in gene expression covariance, demonstrating shared covariation in expression patterns across multiple genes, across the human body (Lukk et al., 2010), and within multiple organs (Lenz et al., 2016).This includes across the human cortex (Bhaduri et al., 2021), where general dimensions of gene expression have previously been linked to in-vivo MRI estimates of cortical structural anatomy (Burt et al., 2018), and to functional MRI-derived neurocognitive associations (Hansen et al., 2021;Wagstyl et al., 2022).It is therefore critical to control for general dimensions along which regional variation in gene expression covary when seeking gene-specific associations.Because much of our information on gene expression patterns in the brain (with sufficient regional fidelity for these questions) comes from relatively few donors, it is also critical to seek out-of-sample replication.Here, we unite micro-(gene expression) and macro-level (morphometry) information about the brain, to inform the underlying molecular neurobiology of complex cognitive functioning.
General cognitive functioning, or 'g', is a robust and wellreplicated index of individual differences in cognitive functioning, capturing variance in reasoning, planning, problem-solving, some aspects of memory, processing speed and abstract thinking (Deary, 2012;Panizzon et al., 2014).It is associated with educational attainments (Bijwaard et al., 2015), life achievements (Strenze, 2007), health (Wraw et al., 2015) and lifespan (Batty et al., 2007;Calvin et al., 2011).Regions of the brain proposed to support general cognitive functioning, or 'g' (and which relate to individual differences therein), have been identified via an array of methods including resting state fMRI (Dubois et al., 2018), structural and functional connectivity (Zimmermann et al., 2018), lesion studies (Barbey et al., 2012), post mortem brain studies (Dawe et al., 2018) and genetic information (Davies et al., 2018).These brain regions overlap substantially with those associated with other summary cognitive constructs, such as executive functioning (Camilleri et al., 2018).Macrostructural cortical measures provide some convergent evidence for a specific patterning of brain regional g-correlates, particularly highlighting parieto-frontal regions (Cox et al., 2019;Jung & Haier, 2007).However, debate remains about the loci of g's cortical correlates, for which large multi-cohort analyses are required (Marek et al., 2022).Specifically, there is uncertainty in how much overlap there is in the spatial patterns of g associations with cortical thickness and surface area, measures which are largely phenotypically and genetically distinct (Dickerson et al., 2009;Eyler et al., 2012;Panizzon et al., 2009;Storsve et al., 2014).
Here, we combine (i) postmortem gene expression data and (ii) the largest meta-analysis of the cortico-macrostructural correlates (in vivo MRI; cortical volume, surface area and thickness) of individual differences in cognitive functioning to-date.Both are available at the same level of granularity with respect to brain regions, allowing us to quantitatively assess spatial associations between cortical gene expression and general cognitive function.Therefore, we can ask this new question: is there an association between variation in gene expression across different brain areas and how strongly brain structural measures are associated with cognitive functioning in those same brain regions?That is, does the brain regional map of gene expression resemble the brain regional map of brain structure-cognitive function correlations?
Previous work shows that the expression of genes varies together in a synchronised fashion across the cerebral cortex (Burt et al., 2018(Burt et al., , 2020;;Dear et al., 2022;Hawrylycz et al., 2012;Markello, Arnatkeviciute, et al., 2021).Here, using French and Paus' regional mapping of expression for 8235 genes, two major components account for the majority (49.4%) of the variance in regional gene expression profiles, representing a cell-signalling/modifications axis and a transcription factors axis, and we control for these components to analyse associations of single genes and cell types.We address the potential limitations of having only N = 6 tissue donors and one regional sampling approach: the dimensions of gene expression are validated in two independent gene expression atlases (N = 5, N = 11 tissue donors), and are not driven by a small number of individual outlier regions.Similarly, our meta-analysis of associations between g and regional cortical morphometry (volume, surface area, thickness), across three cohorts (total N = 39,519), shows good cross-cohort consistency in regional mapping.
The patterning of g-associations with brain structural measures across the cortex are associated with both of the identified gene expression components, with medium-to-large effect sizes for g-volume and g-surface area associations but weaker ones for g-thickness.We further identify 29 single genes whose expression patterns are individually associated with g-cortical profiles beyond the two major dimensions of cortical gene expression.Thus, this study provides clarity on the patterning and replicability of the brain-macrostructural correlates of cognitive functioning differences and identifies novel regional global and specific gene expression patterns that might govern them.

| Gene expression method
The Allen Human Brain Atlas is a high-resolution mapping of cortical gene expression for N = 6 donors (five male, one female, age M = 42.50 years, SD = 13.38 years, range = 24-57 years).The complete microarray data from a custom-designed Agilent array for all six donors are openly available for download.French and Paus (2015) summarised these data to the Desikan-Killiany cortical atlas.To briefly summarise their method (for more information, refer to the original paper and Table S6), gene expression values were averaged across multiple probes.Each of the 3702 brain samples was assigned to one of the 68 Desikan-Killany regions based on their MNI coordinates, and then gene expression values were averaged per region, resulting in an expression value for each gene for each region.These betweendonor median expression values are publicly available (French, 2017).
French and Paus also provide a method of quality control for between-donor consistency in regional gene expression profiles (French & Paus, 2015).In this method, profiles with Spearman's ρ > .446(equivalent to one-sided p < .05) between the average of donor-to-median left hemisphere profile correlations are considered to have high between-donor consistency.This method results in the retention of 8325 out of 20,737 genes.
The right hemisphere expression data are based on a maximum of N = 2 donors, compared to a maximum of N = 6 donors for the left hemisphere.The number of samples per region is lower in the in right hemisphere (M = 12.59, SD = 8.90, range 2-34) than the left hemisphere (M = 37.32, SD = 24.37,range = 6-100).Further details of the number of samples and donors per region are in Table S2.The donor-level expression data are not available, so in the present study the 8235 genes that passed the quality control protocol in the left hemisphere were also analysed for the right.There is a strong correlation between the expression values of individual genes between hemispheres (r = 0.997, p < 2.2e 16 , see Figure S2) suggesting that, at the hemisphere level, the relative expression values for the 8235 genes were not affected by the sampling differences between the two hemispheres.
We conducted PCA on the median gene expression values of 8235 genes across 68 cortical regions-rows = cortical regions (in place of participants in a traditional PCA) and columns = genes.
We performed extensive checks for the validity of the first two components-these are detailed in Section 3.
In the raw data, the right hemisphere generally has slightly lower average expression values than the left hemisphere (right: M = 6.035,SD = 2.343, left: M = 6.091,SD = 2.368, t(58.345)= 6.490, p = 2.051e 08 ).This is likely due to comparative lack of power between hemispheres or due to differences in sample compositions, as there is only a maximum of N = 2 donors for the right hemisphere compared to a maximum of N = 6 donors for the left.Alternatively, mean age and sex differences between the two right hemisphere and six left hemisphere donors may have contributed to the differences.Both right hemisphere donors were male, and they include the youngest donor (24 years old, while the other donors' ages range between 31 and 57 years old).There is a clear hemisphere difference centred To confirm it was appropriate to treat these hemispheric differences as an artefact of the data, and thus scale the component scores in each hemisphere, we looked to the Kang et al. (2011) dataset.In this dataset, there was a more even number of donors per hemisphere (left hemisphere M = 9.55, SD = 1.04 donors per region, right hemisphere region M = 7.19, SD = 0.60 donors per region), the same age range for data in both hemispheres (23 to 55 years old) and a more even balance of male and female between hemispheres (left hemisphere: 7 male, 4 female; right hemisphere: 5 male, 4 female).There was no difference in mean expression values per hemisphere t (19.344) = À0.852,p = .405,M left = 7.521 (SD = 1.944),M right = 7.535 (SD = 1.943).For the rotated scores of PC1 (which had a factor congruence of 0.96 with the French and Paus expression matrix), scores were comparable between hemispheres t(19.997)= À0.265,p = .794.Therefore, we deemed it appropriate to scale the component scores separately for each hemisphere in the current dataset (see Figure S3).

| Statistical overrepresentation analysis
To assist with interpretation of the two identified major components of gene expression, PANTHER's protein analysis and GO-Slim molecular, biological and cellular (version 16.0, released 2020-12-01) terms were analysed.All genes included in the PCA were submitted as a reference set for the statistical overrepresentation analysis and 7389 out of 8235 (89%) genes were available in PANTHER, and so were used as the background set.Fisher's exact test and FDR correction were used, and four subsets of genes were tested for statistical overrepresentation: Component 1 loadings < À0.3 (total N = 3367, available N = 3099, 92%) and loadings >0.3 (total N = 2093, available N = 2000, 96%); and Component 2 loadings < À0.3 (total N = 3476, available N = 3234, 93%) and loadings >0.3 (total N = 1705, available N = 1551, 91%).Note: The GO analysis results for a threshold at 0.6 are also available in the Supplementary Tabular Data File.Whilst there are generally fewer FDR significant GO terms (which is expected, as there are as there are fewer genes included than the more lenient 0.3 threshold-Component 1 N = 674 genes >0.6, and 1891 genes <À0.6 and Component 2, N = 471 genes >0.6, and 1815 < À0.6), they are consistent with those from the 0.3 threshold, which suggests that the interpretation of GO terms at 0.3 holds for this more stringent threshold.
The statistical overrepresentation results are provided in full in a supplementary data file.Some genes have absolute loadings >0.3 on both components (N = 3026, 36.75%).There are also a number of genes that had absolute loadings >0.3only on either Component 1 (N = 2438, 29.61%) or Component 2 (N = 2157, 26.19%).N = 614 genes (7.46%) did not load with an absolute >0.3 on either component, and all statistical overrepresentation tests for this set were null.
For the two components, a gene set enrichment analysis was run in FUMA (Functional Mapping and Annotation of Genome-Wide Association Studies, https://fuma.ctglab.nl/).Gene sets were created using previous associations in GWAS of diseases and traits listed in the GWAS catalogue.Hypergeometric tests were performed to test if genes of interest were overrepresented in any of these pre-defined gene sets (those with absolute loadings >0.3 on each component and, separately, those with the more stringent threshold of 0.6), with the 8235 genes as a background set.No significant (α < .05)gene sets were reported.

| Cohorts
The UK Biobank (UKB, http://www.ukbiobank.ac.uk;Sudlow et al., 2015) holds data from $500,000 participants, and for $40,000 at wave 2 of data collection, data includes head MRI scans and cognitive test data.In the current study, we did not include participants if their medical history, taken by a nurse at the data collection appointment, recorded a diagnosis of e.g.dementia, Parkinson's disease, stroke, other chronic degenerative neurological problems or other demyelinating conditions, including multiple sclerosis and Guillain-Barré syndrome, and brain cancer or injury (a full list of exclusion criteria is listed in the Supplementary tabular data file, and see The LBC1936 is a longitudinal study of a sample of communitydwelling older adults most of whom took part in the Scottish Mental Survey of 1947 at $11 years old, and who volunteered to participate in this cohort study at $70 years old (Deary et al., 2007;Taylor et al., 2018;

| MRI protocols
Detailed information for MRI protocols in all three cohorts are reported elsewhere: UKB (Miller et al., 2016), LBC1936 (Wardlaw et al., 2011) and STRADL (Habota et al., 2021), but are briefly summarised here.In the present sample, UKB participants attended one of four testing sites: Cheadle (N = 22,636, 60%), Reading (N = 5463, 14%), Newcastle (N = 9526, 25%) and Bristol (N = 51, 0.14%).The same type of scanner was used in all four testing sites, a 3 T Siemens Skyra, with a 32-channel Siemens head radiofrequency coil.The UKB MRI protocol includes various MRI acquisitions (more details available here https://www.fmrib.ox.ac.uk/ukbiobank/protocol/V4_23092014.All LBC1936 participants were scanned in the same scanner at the Brain Research Imaging Centre, Western General Hospital, Edinburgh, using a GE Signa LX 1.5 T Horizon HDx clinical scanner (General Electric, Milwaukee, WI) with a manufacturer supplied 8-channel phased array head coil.More information on the structural image acquisitions for the LBC1936 cohort is available in (Wardlaw et al., 2011).For T1-weighted images (3D IR-Prep FSPGR), 160 coronal slices were acquired, with a field of view of 256 mm and a matrix size of 192 Â 192 pixels giving a resolution of 1 Â 1 Â 1.3 mm (Vidal-Pineiro et al., 2020).The repetition time was 10 ms, echo time was 4 ms and inversion time was 500 ms.
For all cohorts, the FreeSurfer image analysis suite (http://surfer.nmr.mgh.harvard.edu/)was used for cortical reconstruction and volumetric segmentation.The Desikan-Killany atlas parcellation yields 34 paired regional measures in left and right cortical hemispheres (Desikan et al., 2006).Different versions of FreeSurfer were used in the three cohorts (UKB = v6.0,STRADL = v5.3,LBC1936 = v5.1),and only for UKB were T2-FLAIR volumes used to improve the pial surface reconstruction.The LBC1936 and STRADL parcellations have previously undergone thorough quality control, with manual editing to rectify any issues.Manual edits were performed to ensure correct skull stripping, tissue identification and positioning of cortical regional boundary lines.The UKB regional data were extracted from the aparc.stats files and these parcellations have not been manually or automatically edited.For the current study, UKB values more than 4 standard deviations from the mean for any individual regional measure were excluded (UKB M = 24.28,SD = 19.41,range = 0-104 participants per region).For UKB and STRADL cohorts, cognitive and MRI data were collected on the same day, but in LBC1936, there was a slight delay between the two testing sessions (M = 65.08,SD = 37.77 days).
Raw values are plotted for mean volume, surface area and thickness by age and cohort in Figure S7, and for each region in Figures S8-S14.

| Cognitive tests
All three cohorts have collected data across several cognitive tests, covering several cognitive domains, which enables the estimation of a latent factor of general cognitive functioning (g).The cognitive tests in each cohort have been described in detail elsewhere: UKB (Fawns-Ritchie & Deary, 2020), STRADL (Habota et al., 2021), LBC1936 (Deary et al., 2007;Ritchie et al., 2016;Tucker-Drob et al., 2014).The measures used in the present study are summarised in Tables S10-S12 and correlation plot of cognitive tests within each cohort is available in Figure S5.In STRADL and LBC1936, the cognitive data was used as provided, as this data has been pre-cleaned.For UKB, we coded prospective memory from 0 to 1, as suggested in (Lyall et al., 2016), for numeric memory, values at À1 were removed (abandoned test) and for Trail B, values at 0 were removed (trail not completed).Reaction time, trail B and pairs matching scores were log transformed.
A latent factor of g was estimated for each cohort, using all available cognitive tests, using confirmatory factor analysis in a structural equation modelling framework.Each individual test was corrected for age and sex.Latent g model fits were assessed using the following fit indices: Comparative Fit Index (CFI), Tucker Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA), and the Root Mean Square Residual (SRMR) (for model fits, see Table S17).For the LBC1936, g has previously been modelled with a hierarchical confirmatory factor analysis approach, to incorporate defined cognitive domains (Ritchie et al., 2016;Tucker-Drob et al., 2014).Here, in keeping with these previous models, within-domain residual covariances were added for four cognitive domains (Visuospatial skills, Crystalised ability, Verbal memory and Processing speed).Results of the g measurement models are summarised in Tables S13-S16, and   Figure S6.For all cohorts, all estimated paths to latent g were statistically significant with all p < .001.
The latent g scores were extracted for all participants.Those for UKB were multiplied by À1 so a higher score reflected better cognitive performance, to match scores from STRADL and LBC1936.Then, for each cohort, a standardised β was estimated between g and three measures of cortical morphometry (volume, surface area and thickness) for each of the 68 regions.Cortical measures were controlled by age, sex head position in the scanner (X, Y and Z coordinates), testing site (for UKB and STRADL) and lag between cognitive and MRI appointments (for LBC1936).The resulting standardised β estimates for each region and each measure were meta-analysed between the three cohorts (68 regions Â 3 measures = 204 random effects metaanalyses).The full results of these meta-analyses are in Tables S18-S20.
Although we controlled for age in the g-cortical morphometry association models within each cohort, each cohort had different age ranges (with the LBC1936 having a notably narrow age-range of 70-74 years old), and it is possible this might affect the associations.Therefore, we also tested for mean age moderation effects on metaanalytic estimates, and none were significant after FDR correction (all FDR Q > .27),see Tables S21-S23.

Additional analyses
In addition to the main analyses, which focus on g-associations with general and specific gene expression profiles, we also ran a parallel supplementary analysis simply on the regional morphometry means (see Supplementary Text 1).

| Analysis software
Most analyses were conducted in R 4.0.2.(R Core Team, 2020).The psych package was used for PCAs (Revelle, 2021), the core R stats package was used for the Kruskal-Wallis tests, the FSA package (Ogle et al., 2021) was used for Dunn's Kruskal-Wallis multiple comparisons, and the metafor (Viechtbauer, 2010) package was used for the meta-analyses.All structural equation models were estimated in lavaan (Rosseel, 2012) with the full information maximum likelihood method.GO term analyses were conducted at http://geneontology. org/, which is powered by PANTHER (Thomas et al., 2003).FUMA https://fuma.ctglab.nl/ was used for gene set enrichment analysis for the two components, and previous GWAS associations with allelic status of the specific individual genes-g associations were looked up in the GWAS catalogue.For spatial correlations, we report both the pvalue directly from the Pearson's correlation and the_permutation p-value, using a regional spin test method in MATLAB (Váša et al., 2017) https://github.com/frantisekvasa/rotate_parcellation,with 1000 permutations.

| Two major dimensions of cortical gene expression
The Allen Human Brain Atlas consists of a high-resolution mapping of gene expression to the cerebral cortex for N = 6 donors (5 male, 1 female, Age M = 42.50 years, SD = 13.38 years, range = 24-57 years).French and Paus (2015) summarised these data across donors to find the average gene expression values for each region in the Desikan-Killiany atlas and provide a method of quality control for between-donor consistency in regional gene expression profiles, which results in retention of 8325 genes (out of 20,737 originally available from the atlas).These retained genes are associated with neural gene ontology (GO) terms, and those not retained tend to have low expression across the cortex or are associated with other GO terms, for example, olfactory receptor and keratin genes (French & Paus, 2015).This results in a gene expression matrix (rows = 68 cortical regions, columns = 8235 genes) of median gene expression values for each region for each gene across donors.Initial results of a principal component analysis (PCA) on these data indicated that regional variation in gene expression across the cortex occurs across very few biological dimensions (see Figure 1b); that is, there was much similarity across genes in the patterning of their expression across brain regions.Mindful of the potential limitations of basing a new discovery in fundamental neuroscience on a modest postmortem dataset (N = 6 donors), we performed extensive checks.
We tested the factor congruence of the resulting principal components in terms of: over-reliance on specific cortical regions, congruence with nine different gene expression data processing pipelines, in two independent samples, and different brain parcellation choices.
First, to test the regional dependence of the principal components, we used cross-validation to create five random partitions of the 68 regions 50 times without replacement.Each time, the partitions were arranged into two sets, one with $54-55 regions (4 of 5 partitions) and the other with $13-14 regions (1 of 5 partitions).The PCA was repeated for each iteration (a total of 250 tests).Absolute coefficients of factor congruence between the two sets tended to be high for the first two components (PC1: M = 0.926, SD = 0.064; PC2: M = 0.830, SD = 0.092), and were notably weaker, with higher variability, from the third component onwards, see Figure 2c.Therefore, the first two components do not rely heavily on individual regions, and so were taken forward in the current analysis.Unrotated, PC1 accounted for 31.9% of the variance, and PC2 for 17.5%, (after varimax rotation PC1 accounts for 25.8% of the variance, and PC2 for 23.6%).
Although there are efforts towards developing standardised processing of gene expression data (Markello, Arnatkeviciute, et al., 2021), there remains no consensus.There have been several proposed pipelines for summarising the Allen Human Brain Atlas data, and so we sought to test whether the gene expression components derived using PCA are valid when the data are summarised with different processing choices.We applied the scripts provided by Markello, Arnatkeviciute, et al. (2021), that reproduce the pipelines for several studies (see Table S6).The initial number of retained genes, and the number of genes matched to French and Paus' postconsistency check genes are in Table S8.We investigated whether the two identified components are similar when different methods of summarising the Allen Human Brain Atlas to Desikan-Killiany space (see Figure 2d).To test this, we replicated the gene expression matrix from French and Paus using Markello et al.'s scripts (Markello, Arnatkeviciute, et al., 2021) and the abagen toolbox (Markello, Shafiei, et al., 2021).This replication is not exact, but very close-8108 genes were retained and the factor congruence coefficient for PC1 = 0.99, and for PC2 = 0.98.We then ran PCAs on the resulting gene expression matrices obtained from nine gene expression data processing pipelines (Markello, Arnatkeviciute, et al., 2021; see Table S6).For each pipeline, we calculated factor congruence coefficients with French and Paus' method based on matched genes.Absolute coefficients ranged from 0.93 to 0.97 for PC1 and 0.24 to 0.72 for PC2 (see Figure 2).Of all 10 pipelines, French and Paus' is the only one to not include gene normalisation (see Table S6), which might account for that impact the number of genes retained (the number of genes retained for these three pipelines range from 6164 to 7800, compared to 8108 for the other pipelines, see Table S6).
We limited the donor age from these additional datasets to be in proximity to the age range of the Allen Human Brain Atlas (24-57 years of age).See Table S1 for descriptive statistics of the validation samples.The test for between-donor consistency provided by The meta-analysed standardised β values of each g-regional morphometry metric (cortical volume, surface area and thickness) associations.(g) Spatial correlations were tested between the brain-regional component scores for gene expression and the regional g $ brain cortical morphometry associations.Then, controlling for the regional component scores, g-associations for individual genes were calculated.
French and Paus (64) was applied to these datasets.Then, PCAs were conducted on the gene expression matrices (rows = cortical regions, columns = genes), and genes were matched with those in the Allen dataset to test for factor congruence. Summaries of the number of retained genes at each step are in Table S7.
To test whether the first two components were generalisable Lastly, we tested whether the positioning of regional boundaries affected the consistency of the components (see Figure 2f).Three open-source atlases were tested: Yeo's Functional Connectivity 7 and 17 Network atlases (with 7 and 17 regions, respectively) (Yeo et al., 2011) and the Destrieux atlas (134 regions, 67 per hemisphere) (Destrieux et al., 2010).For all three, as with the Desikan-Killiany atlas, 8108 genes matched with the 8235 from the main working dataset.
Again, factor congruence coefficients tended to be higher for PC1 Allen congruence coefficients tended to increase for PC2 with increasing number of regions.These results may partially explain why PC2 BrainSpan was less with PC2 Allen (11 regions, less granular) compared to PC2

| Interpretation of gene expression components
To aid interpretation of the two components, we conducted statistical overrepresentation analyses, at http://geneontology.org/, which is Two major components of cortical gene expression.Top and middle panels (Component 1 and Component 2, respectively) left: Regional z scores mapped to the cerebral cortex (scaled for each hemisphere) and right: word clouds of the statistical over-representation results.
The relative direction of component scores is arbitrary (dictated by the PCA), and here, the colour scale is flipped between components so that the directions of upregulation/downregulation match.Bottom panel: Density distribution plots of loadings on Component 1 and Component 2 coloured by cell type.
powered by PANTHER (Thomas et al., 2003).The results suggest that Component 1 represents cell-signalling and post-translational modification processes (with loadings <À0.3 providing upregulation and those >0.3 providing downregulation) (see Figure 3  Descriptive statistics of component loadings for each cell-type are in Table S3 and the results of the Dunn post-hoc tests are in Tables S4 and S5.Generally, the loadings of different cell types tend to be skewed.For Component 1, loadings for all but ependymal and interneuron cell types have an absolute skewness value >0.228.For Component 2, all but endothelial cells have absolute skewness >0.187.This skewness in loadings suggests that specific cell types might play a particular roles in the regulation of the two components.We investigated whether specific cell types load on the two major components in ways that deviate from the aver- Additionally, through FUMA https://fuma.ctglab.nl/we tested whether the genes with absolute loadings >0.3 on each component were significantly related to gene sets in the GWAS catalogue.There were no significant (α = .05)associations for either component, dem- onstrating the highly general nature of the two components of cortical gene expression.

| Regional distribution of component scores across the cerebral cortex
The component scores were scaled to correct for interhemispheric differences in gene expression values (see Section 2 for details).
3.4 | Cortical morphometric associations with general cognitive functioning (g)-meta-analyses (N = 39,519) We first used raw data from three cohorts to estimate regional associ- tests to be correlated, and is generally invariant to cognitive test content, provided that multiple domains are captured (Salthouse, 2005).
These properties lend it well to cross-cohort genetic analyses (Davies et al., 2018), for example, and we leverage them here.
Latent g scores were extracted for all participants, and associations with three measures of cortical morphometry (volume, surface area and thickness) were estimated for each of the 68 regions in each cohort.Cortical measures were controlled by age, sex, head position in the scanner (X, Y and Z coordinates), testing site (for UKB and STRADL) and lag between cognitive and MRI testing appointments (for LBC1936).For UKB, X, Y and Z co-ordinates were calculated relative to one target participant, and for LBC1936 and STRADL, they were taken from the mri_info -cras flag output.We computed standardised β estimates of the association in each brain region between g and each brain morphometric property (volume, surface area, thickness) for each cohort.There were strong cross-cohort correlations for g-associations between the 68 regions for each measure of morphometry (see Table 1).
The current results provide support for theories (Cox et al., 2019;Jung & Haier, 2007) regarding which regions are key in brain morphometry-g associations (e.g., the parieto-frontal integration theory, P-FIT, see Figure S16).Parietal and frontal regions generally have relatively strong g-associations with volume and surface area, though not with cortical thickness.For all three morphometry measures, the superior temporal region had relatively high g-associations mean β (between hemispheres) = 0.163, 0.143 and 0.116, for volume, surface area and thickness respectively.Some of the highest g-volume and gsurface area associations are for the fusiform (mean β (between hemispheres) = 0.154 and 0.126, for volume and surface area, respectively) and inferior parietal region (mean β (between hemispheres) = 0.153 and 0.145, respectively).The precuneus regions also have among the overall highest associations (mean β = 0.136, and β = 0.129), in line with updated reports of regional g-associations (Basten et al., 2015;Cox et al., 2019).
To help interpret why some regions might have higher associations with g than others, we tested correlations between regional gassociations and the regional mean profiles of volume, surface area T A B L E 1 Cross-cohort correlations of regional g-associations.
Cohort comparison g $ volume g $ surface area g $ thickness and thickness (reported in Supplementary Text 1).The regional gassociations were positively associated with the corresponding regional mean profiles for all three morphometry measures.Volume had the strongest correlation (r = 0.709, p 1.35e 11 , p_spin < .0001),followed by surface area (r = 0.614, p = 2.58e 08 , p_spin < .0001),and then thickness (r = 0.313, p = .009,p_spin = .0095).In other words, F I G U R E 4 Meta-analysed brain regional associations with g.(a) Standardised β estimates and FDR Q values for each g $ morphometry association mapped to the cerebral cortex.(b) Meta-analysed standardised β estimates for g-volume, g-surface area and g-thickness.Those for which p < .05are filled in, and those for which p > .05are outlined.The y-axis is ordered by the mean left and right hemisphere meta-analysed β values (decreasing) for each morphometry measure.
regions with stronger g-associations tend to be larger in terms of volume and surface area, and also moderately tend to be thicker.
We then tested whether larger and thicker brain regions are more strongly associated with g because they tend to be better proxies for whole-brain measures (as they contribute more to the total measure).
The magnitude of total brain g-volume and g-surface area associations are in line with the maximum of the individual regions-g-whole cortex volume (β = 0.180, SE = 0.036, p = 5.93e 07 ), g-whole cortex surface area (β = 0.160, SE = 0.021, p 3.93e 14 ).Perhaps as there are some negative g-thickness associations, the g-whole cortex mean thickness association was not significant (β = 0.065, SE = 0.043, p = .13).We then corrected regional g-associations for these total brain measures, and correlated regional profiles with and without correction.These correlations are moderate-to-strong: g-volume (r = 0.613, p = 2.76e 08 ), gsurface area (r = 0.556, p = 8.78e 07 ) and g-thickness (r = 0.945, p < 2.2e 16 ), suggesting that it is not simply because larger/thicker brain regions are a better proxy for the whole brain measure that they are more strongly associated with g.

| Associations between g and components of gene expression
Next, we tested whether brain regions' differences in gene expression (as measured using the two components we described earlier) are correlated with g-morphometry associations.That is, we asked whether brain regions for which morphometric measures (volume, surface area and thickness) are more strongly related to g were also more strongly related to general dimensions of gene expression.
We tested linear correlations between the absolute component scores of gene expression and the meta-analysed standardised β scores for cortical morphometry associations with g, and also report the comparable quadratic regression results with non-absolute scores (see Table 2 and Figure 5).There were negative associations for all analyses, and those for g-volume and g-surface area were moderate-to-strong and statistically significant at the α < .05level.
These results suggest that, generally, regions more strongly associated with g tend to be more balanced between the downregulation and upregulation sides of both cell-signalling/modification and transcription factors components.
There were no correlations between regional mean expression across genes and g-brain morphometry association profiles for which p < .05( g-volume r = À0.023,p = .853;g-surface area r = À0.058,p = .640;g-thickness r = 0.103, p = .403),demonstrating the value of the PCA approach as associations between genome-wide dimensions of expression and g are not reducible to an average brain-wide pattern of gene expression.

| Associations between g and individual genes
Lastly, we tested for individual gene-g expression pattern associations, controlling for the two general components of gene expression.For all 8235 individual genes, the median expression scores per region were scaled separately for left and right hemisphere regions, to account for sample-based artefacts in hemisphere differences in expression values, in line with the method for the component scores.
After FDR correction (threshold = Q < .05),there were 522 individual genes whose cortical patterning was correlated with g-volume patterning, 609 with g-surface area and 516 with g-thickness (these results are available in detail in the supplementary data file, and Figures S18-S20).Two hundred and sixty eight genes were shared between g-volume and g-surface area, 253 between g-volume and gthickness and 42 between g-surface area and g-thickness.Forty one genes with FDR Q < .05overlapped for all three morphometry measures, and of these, spin-test p-values were < .05for all three morphometry measures for 29 genes (jβj range = 0.18 to 0.53, see Figure 6).These 29 genes are particularly likely candidate substrates of cognition.Some regional expression profiles have positive associations with g-cortical measure profiles, while others have negative associations.For discussion, genes with negative g-associations in the present study are marked with an asterisk.
Several of these 29 genes have been previously reported to be associated with Alzheimer's Disease: overexpression of ANGPT1* has been found to increase amyloid beta secretion (Peng et al., 2020) whilst CLEC4G suppresses amyloid beta (Kizuka et al., 2015), and increased levels of ENC1 (Gns et al., 2022;White et al., 2017) and NPTX2 (Belbin et al., 2020;Libiger et al., 2021;Shao et al., 2020;Xiao et al., 2017)  Note: Pearson's r values for the correlation between brain-g associations and brain-gene expression component profiles.Note that these are for linear associations using the absolute gene-expression component scores.
Results for the equivalent associations using the non-absolute components scores (quadratic component) are presented in Table S24 and Figure 5, which illustrates the balance between downregulation and upregulation.
and Aβ42/40 ratio) (Quillen et al., 2023), and DTNB is an indicator of extent of neuronal injury and inflammation in Alzheimer's disease (Neumann et al., 2022).CYP26B1 is upregulated in the prefrontal cortex (Shibata et al., 2021), and there are links between its catabolism in the hippocampus and poor cognitive outcomes in mice (Wołoszynowska-Fraser et al., 2021).
F I G U R E 5 Associations between regional-g profiles and the two gene expression components.LOESS functions are plotted (the quadratic model results are comparable to the absolute score correlations and are presented in Table S24).A vertical line at component scores of 0 represents a balance between upregulation and downregulation ends of each component.The colour scale is flipped between Components, so that the direction of downregulation and upregulation match.
Other individual genes associated with g-cortical profiles have been associated with cognitive functioning more generally, for example, RGS9* has been implicated in motor coordination and working memory (Blundell et al., 2008).Some others have been linked to cognitive disorders, for example LYRM4 has been linked to schizophrenia (Jablensky et al., 2012).Previous significant GWAS associations with these 29 genes were identified in the GWAS catalogue and are available in the supplementary data file.We also ran a protein network analysis on the 29 identified genes through STRING (Szklarczyk et al., 2023)

| Associations between g and cell types
Then, we used our discovery of what is common about regional cortical gene expression profiles to identify specific cell type-cognitive relationships.The raw expression values by the nine cell types and their correlations are shown in Figure S17.The mean profiles of the nine specific cell types were scaled in each hemisphere, and we controlled for the two major components of gene expression (detailed regression results are in Table S25) to assess their associations with g-

| DISCUSSION
This study reveals and validates a fundamental organisation principle of cortical gene expression patterns across the human brain.We then use this information to identify the shared and specific aspects of regional cortical gene expression and show that they are associated with regional brain-structure correlates of complex thinking skills.We We conducted one of the largest analyses of g-cortical morphometry associations to-date.These associations are generally in line with the parieto-frontal integration theory (P-FIT; Jung & Haier, 2007) and strengthens support for the involvement of regions (e.g., temporal, precuneus) that were not included in earlier iterations of the model.
There was strong agreement across the three cohorts in the magnitudes and spatial patterning of associations, which speaks to the validity of g, as a measure of cognitive functioning (indicated by different cognitive tests included by each cohort).The consistency of results also indicates that careful harmonisation of image processing alongside careful attention to phenotype measurement may partially offset the apparent need for many thousands of participants to obtain replicable brain-behaviour association results (Marek et al., 2022).
Turning to the gene expression components-g correlations, the more strongly regions were associated with g, the more they tended towards the balance between the downregulation and upregulation sides of the cell-signalling/modification and transcription factors components.Complex cognitive functions therefore may be facilitated at a midpoint of downregulation and upregulation of each of these components.Other regions that fall on either the downregulation or upregulation sides of each of the two major dimensions perhaps specialise in less general functions.An important question this raises, but we cannot answer here, is whether individual differences in the balance of gene expression in these cortical dimensions might partly explain why people differ from each other in their general cognitive functioning.
Using this newly gleaned information about what is common among gene expression profiles, we then identified 29 individual genes whose spatial expression was correlated with g cortical patterning, independent of the two major dimensions.Whereas some of the genes strongly indicated in the two major dimensions themselves could also pertain, causally, to mechanisms and processes underpinning g, the nature of shared expression patterns as presented here disallows that direct inference for individual genes.In contrast, these 29 genes with specific associations are particularly strong candidates for playing a role in facilitating complex cognitive processes.Several of these genes have been previously identified as associated with various cognitive outcomes, whilst others are potentially less wellresearched substrates of cognition.Both the current gene patterning approach and GWAS can be used to identify gene-cognition associations.GWAS examines common genetic variants and their association with cognitive traits, whereas a gene expression patterning approach encompasses the effects of many common and rare genetic variants along with environmental influences.Due to these differences, these approaches may identify different genes.Additionally, in this study, a gene expression approach allows us to look specifically at gene expression within the brain.
After testing the g-associations of the major components of gene expression, we turned to specific cell type-g associations.Microglia and ependymal cells both had negative associations with cognitive morphometry measures-ependymal cells with volume and thickness, and microglia with volume and surface area.These two cell types both play key roles in waste removal from the brain, which might explain these negative associations-it could be that some regions specialise in fundamental brain maintenance processes, such as waste management, thus enabling others to specialise in cognitive processes.This study has several strengths and limitations.As we quantitatively demonstrate, the present approach surpasses candidate gene and median expression information, clarifying our understanding of the molecular substrates of complex cognitive abilities in the human brain.We extensively validate the discovered components of gene expression, mitigating concerns that this finding might be an artefact of a small number of donors in a single sample.The first component is highly consistent across different datasets, gene expression data processing and summary choices, and brain regional parcellation choices.
Although the second component does not depend heavily on individual regions, it is somewhat affected by the granularity and boundaries of the parcellation, gene expression data sampling and processing choices, and the number of genes retained.While efforts continue to standardise gene expression processing pipelines (Markello, Arnatkeviciute, et al., 2021), the effects of different choices on dimensions of between-gene covariances should continue to be considered.
As donor contributions to gene expression databases continue to increase, brain regional summaries of gene expression will become more precise.Several genes were excluded from the current analysis due to low between-donor consistency.Although this is partially due to some of these genes having generally low expression across the cortex, there also appears to be an effect of the sampling methods of gene expression data.Gene expression sampling methods consistent with clear cortical boundaries and full cortical coverage will increase between-donor consistency in regional gene expression profiles and enable stronger tests of external validity.Additionally, future research should consider whether major dimensions of regional cortical gene expression, such as those reported in the current paper, are consistent between postmortem and in vivo data (Liharska et al., 2023), as this is likely to affect the interpretation of results.
We leveraged the fact that the UKB, STRADL and LBC1936 cohorts have adopted comparable methods, including similar MRI processing pipelines with FreeSurfer http://surfer.nmr.mgh.harvard.edu/,and collection of various cognitive test scores, which enabled us to harmonise the processing and approach to the calculation of g.Consistency in the applied methods between cohorts allows for direct quantitative comparison.Despite these advantages, there were also some differences between MRI data and processing the three cohorts, which might differentially affect the cortical surface results: (i) each of the three cohorts used different scanners for MRI acquisition and, although T1-weighted data provides consistent between-scanner measures (Buchanan et al., 2021), we cannot rule out scanner-specific differences, (ii) Desikan-Killiany parcellations were visually inspected and manually edited for LBC1936 and STRADL, but not for UKB (outliers SD >4 were excluded) and (iii) different FreeSurfer versions were used for each cohort, which is likely to have contributed to some differences in estimations, alongside different types and quantity of cognitive tests.However, high between-cohort correlations suggest that these differences may not meaningfully affect the current results and provide evidence in support the use of g in meta-analytic studies to reach reproducible brain-cognition associations (Nikolaidis et al., 2022).
A separate limitation of this study is that all included participants were in relatively good health, as we chose to exclude participants with declared neurological conditions.It is therefore not clear that the reported regional g-associations would generalise to clinical populations.Additionally, whereas the cognitive-MRI data do not include childhood and adolescence (and therefore the results may not relate directly to those parts of the life span), the good adulthood age coverage, absence of age moderation of the meta-analytic estimates, and clear agreement across cohorts suggests that the wellpowered results reliably capture adulthood brain-g correlations.

| CONCLUSION
In summary, this newly possible study uses robust methods to advance our understanding of how gene expression is associated with complex cognitive functioning.We discovered and interpreted two general components of cortical gene expression, and identified general and specific patterns of gene expression that are candidate substrates that may contribute to some of the association between brain structure and complex cognitive functioning.

Figure
Figure S4 for N by exclusion condition).After these exclusions, the final study included N = 37,840 participants (53% female), age M = 63.81 years (SD = 7.64 years), range = 44-83 years.The UKB was given ethical approval by the NHS National Research Ethics Service Northwest (reference 11/NW/0382).The current analyses were conducted under UKB application number 10279.All participants provided informed consent.More information on the consent procedure can be found at https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id= 100023.STRADL is a population-based study, developed from the Generation Scotland Scottish Family Health Study.Participants who had taken part in the Generation Scotland Scottish Family Health Study were invited back to take part in this additional study, which was initially designed to study major depressive disorder, although participants were not selected based on the presence of depression (Habota et al., 2021; https://www.research.ed.ac.uk/en/datasets/stratifyingresilience-and-depression-longitudinally-stradl-a-dep).Data are available for N = 1188 participants.The current sample includes N = 1043 participants, for whom both MRI head scans and cognitive data are available (60% female), age M = 59.29 years (SD = 10.12 years), range = 26-84 years.STRADL received ethical approval from the NHS Tayside Research ethics committee (reference 14/SS/0039), and all participants provided written informed consent.
pdf) but relevant to this work are the T1-weighted MPRAGE and T2-FLAIR volumes.For T1-weighted images, 208 sagittal slices were acquired with a field view of 256 mm and a matrix size of 256 Â 256 pixels, giving a resolution of 1 Â 1 Â 1 mm 3 .The repetition time was 3.15 ms and the echo time was 1.37 ms.STRADL had two testing sites: Aberdeen (in the present sample, N = 528, 51%) and Dundee (N = 515, 49%).Detailed information about the STRADL structural image acquisitions is available here https://wellcomeopenresearch.org/articles/4-185.For the current analysis, we used the T1-weighted fast gradient echo with magnetisation preparation volume sequence.The Aberdeen site used a 3 T Philips Achieva TX-series MRI system (Philips Healthcare, Best, Netherlands) with a 32-channel phased-array head coil and a back facing mirror (software version 5.1.7;gradients with maximum amplitude 80 mT/m and maximum slew rate 100 T/m/s).For T1-weighted images, 160 sagittal slices were acquired with a field of view of 240 mm and a matrix size of 240 Â 240 pixels, giving a resolution of 1 Â 1 Â 1 mm(Vidal-Pineiro et al., 2020).Repetition time was 8.2 ms, echo time was 3.8 ms and inversion time was 1031 ms.In Dundee, the scanner was a Siemens 3 T Prisma-FIT (Siemens, Erlangen, Germany) with 20 channel head and neck phased array coil and a back facing mirror (Syngo E11, gradient with max amplitude 80 mT/m and maximum slew rate 200 T/m/s).For T1-weighted images 208 sagittal slices were acquired with a field of view of 256 mm and matrix size 256 Â 256 pixels giving a resolution of 1 Â 1 Â 1 mm(Vidal-Pineiro et al., 2020).Repetition time was 6.80 ms, echo time was 2.62 ms, and inversion time was 900 ms.
the generally lower agreement of PC2 with other pipelines.Markello, Arnatkeviciute, et al. (2021) report that, with the Allen Atlas data, different choices of gene normalisation methods have the largest impact on results.Additionally, there is particularly low agreement for PC2 with certain pipelines-Burt, et al. (2018),Anderson, et al. (2020), Liu, et al. (2020) andMarkello, Arnatkeviciute, et al. (2021)-which are likely due to specific choices such as donor-specific probe selection, stringent interareal similarity filtering thresholds, and other choices

1
An illustration of the analytic framework.(a) Gene expression data from the Allen Human Brain Atlas were summarised to the Desikan-Killiany Atlas.(b) We conducted PCA on the gene expression matrices (68 regions * 8235 genes) and two components were justified with validity checks.(c) We rotated these two components, and the component scores show the relative positions of the 68 Desikan-Killany regions on these components.(d) g $ brain cortical morphometry associations were calculated for three cohorts.(e) The g $ brain cortical morphometry associations were meta-analysed with random effects models.(f) beyond the six donors from which the Allen Human Brain Atlas data were derived, we sought external validation with two independent datasets, the BrainSpan Atlas https://www.brainspan.org/ and an atlas provided byKang et al. connected  to the Human Brain Transcriptome Project https://hbatlas.org/(Kang et al., 2011).Both external datasets used the Affymetrix GeneChip Human Exon 1.0 ST Array Platform to summarise gene expression data.and include 11 cortical regions, which have previously been roughly matched to 14 regions in the Desikan Killiany atlas(Wong et al., 2018) (see FigureS1).For BrainSpan (donor N = 5), these are collapsed across hemispheres, but for theKang et al. dataset (donor N = 11), they are available for each hemisphere separately (a total of 22 regions).Genes with consistent between-donor profiles were identified, using French and Paus' procedure(French & Paus, 2015).To test for factor congruence, these were then matched with the 8235 genes that were consistent between donors in the French and Paus dataset, resulting in 2250 genes for the BrainSpan comparison and 908 for Kang et al.The relatively small numbers of retained genes could be due to different cortical boundaries, extent of cortical coverage or the gene expression measurement and sampling methods used.There was high factor congruence for PC1 Allen in both datasets (the coefficient for PC1 BrainSpaN = 0.88 and PC1 Kang et al. = 0.96) and low-moderate factor congruency for PC2 Allen (with PC2 BrainSpaN = 0.24, and PC3 Kang et al. = 0.63; see Figure 2e).PC2 Kang et al. did not have high factor congruence with any Allen component (the maximum absolute value was 0.19, which was with PC6 Allen ).
than PC2 Allen , (PC1 Yeo7 = 0.94, PC1 Yeo17 = 0.96, PC1 Destrieux = 0.80; PC2 Yeo7 = 0.18, PC2 Yeo17 = 0.35, PC2 Destrieux = 0.65).Notably, factor F I G U R E 2 Validating gene expression components.(a) Raw gene expression values for the 34 regions for the left and right hemispheres, for the 8235 consistent genes, ordered by mean region.(b) Correlation plot of the 8235 genes across the 68 cortical regions (8235*8235).(c) Absolute factor congruence coefficients for the first 10 components between 'train' and 'test' folds ($54-55 regions, and $12-13 regions), over 50 repetitions.(d) Absolute factor congruence coefficients from different pipelines with PC1 and PC2 of the current dataset of interest, using the Desikan-Killiany atlas.* denotes that PC3 from that pipeline is compared with PC2.(e) Absolute factor congruence coefficients for two external datasets with PC1 and PC2 of the current dataset of interest.* denotes that PC3 from is compared with PC2.(f) Absolute factor congruence coefficients for three alternative parcellations with the PC1 and PC2 of the current dataset (which uses the Desikan-Killiany atlas).
hemispheric correlation in scores between the 34 paired regions for both Component 1 (r = 0.815, p = 4.411e 09 ) and Component 2 (r = 0.725, p = 1.25e 06 ).The direction of upregulation and downregulation of the loadings (as determined by GO analysis) informed whether the regional component scores suggested upregulation or downregulation of the two components.For Component 1, negative loadings suggest upregulation, and positive suggest downregulation; conversely for Component 2, positive loadings suggest upregulation and negative suggest downregulation.Parietal and occipital regions are on the upregulation side of Component 1 (cell signalling/modification), with frontal and temporal regions indicating downregulation.For Component 2 (transcription factors), lateral frontal areas tend towards balance between upregulation and downregulation, whereas medial frontal regions tend towards downregulation and parietal and occipital regions towards upregulation.For both components and in both hemispheres, the highest absolute scores are observed in the medial occipital regions (pericalcarine, cuneus and lingual regions), which fall strongly on the upregulation side of both components.
(Search Tool for the Retrieval of Interacting Genes/Proteins), with a minimum required interaction score of 'medium confidence' (0.400), and the background set of 8235 genes.Three proteins were not available in STRING, so there were 26 proteins involved in the analysis.The PPI enrichment p-value is .506,with the expected number of edges being 2, and the observed number of edges, also 2, showing that the network does not have significantly more interactions than expected.There are two edges-both connections are sourced from text mining and co-expression: one is between MYL3* and SMPX and the other is between ANGPT1* and IL6R*.Overall, these results suggest that there are not widespread interactions between these top genes, which further validates the aim of establishing unique signals beyond general gene expression patterns.
and the supplementary data file for full GO results and component loadings).